Goto

Collaborating Authors

 query formulation


Query-Focused Extractive Summarization for Sentiment Explanation

arXiv.org Artificial Intelligence

Constructive analysis of feedback from clients often requires determining the cause of their sentiment from a substantial amount of text documents. To assist and improve the productivity of such endeavors, we leverage the task of Query-Focused Summarization (QFS). Models of this task are often impeded by the linguistic dissonance between the query and the source documents. We propose and substantiate a multi-bias framework to help bridge this gap at a domain-agnostic, generic level; we then formulate specialized approaches for the problem of sentiment explanation through sentiment-based biases and query expansion. We achieve experimental results outperforming baseline models on a real-world proprietary sentiment-aware QFS dataset.


Vector Ontologies as an LLM world view extraction method

arXiv.org Artificial Intelligence

Large Language Models (LLMs) possess intricate internal representations of the world, yet these latent structures are notoriously difficult to interpret or repurpose beyond the original prediction task. Building on our earlier work (Rothenfusser, 2025), which introduced the concept of vector ontologies as a framework for translating high-dimensional neural representations into interpretable geometric structures, this paper provides the first empirical validation of that approach. A vector ontology defines a domain-specific vector space spanned by ontologically meaningful dimensions, allowing geometric analysis of concepts and relationships within a domain. We construct an 8-dimensional vector ontology of musical genres based on Spotify audio features and test whether an LLM's internal world model of music can be consistently and accurately projected into this space. Using GPT-4o-mini, we extract genre representations through multiple natural language prompts and analyze the consistency of these projections across linguistic variations and their alignment with ground-truth data. Our results show (1) high spatial consistency of genre projections across 47 query formulations, (2) strong alignment between LLM-inferred genre locations and real-world audio feature distributions, and (3) evidence of a direct relationship between prompt phrasing and spatial shifts in the LLM's inferred vector ontology. These findings demonstrate that LLMs internalize structured, repurposable knowledge and that vector ontologies offer a promising method for extracting and analyzing this knowledge in a transparent and verifiable way.


Reassessing Large Language Model Boolean Query Generation for Systematic Reviews

arXiv.org Artificial Intelligence

Systematic reviews are comprehensive literature reviews that address highly focused research questions and represent the highest form of evidence in medicine. A critical step in this process is the development of complex Boolean queries to retrieve relevant literature. Given the difficulty of manually constructing these queries, recent efforts have explored Large Language Models (LLMs) to assist in their formulation. One of the first studies,Wang et al., investigated ChatGPT for this task, followed by Staudinger et al., which evaluated multiple LLMs in a reproducibility study. However, the latter overlooked several key aspects of the original work, including (i) validation of generated queries, (ii) output formatting constraints, and (iii) selection of examples for chain-of-thought (Guided) prompting. As a result, its findings diverged significantly from the original study. In this work, we systematically reproduce both studies while addressing these overlooked factors. Our results show that query effectiveness varies significantly across models and prompt designs, with guided query formulation benefiting from well-chosen seed studies. Overall, prompt design and model selection are key drivers of successful query formulation. Our findings provide a clearer understanding of LLMs' potential in Boolean query generation and highlight the importance of model- and prompt-specific optimisations. The complex nature of systematic reviews adds to challenges in both developing and reproducing methods but also highlights the importance of reproducibility studies in this domain.


Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) have revolutionized the field of artificial intelligence, demonstrating text understanding and generation capabilities approaching human levels. However, despite impressive results, the internal functioning mechanisms of these models largely remain a "black box." As Amodei [1] notes in his essay "The Urgency of Interpretability," researchers have limited understanding of why LLMs generate specific responses and how they arrive at their conclusions. This lack of transparency becomes increasingly problematic as LLMs begin to play central roles in economics, technology, and national security. Of particular concern are phenomena such as unpredictable hallucinations, extreme sensitivity to query formulations, and puzzling patterns in the probability distributions of generated tokens. These phenomena not only limit the reliability of LLMs in critical applications but also point to fundamental gaps in our understanding of their operation.


Towards Evaluating Large Language Models for Graph Query Generation

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are revolutionizing the landscape of Generative Artificial Intelligence (GenAI), with innovative LLM-backed solutions emerging rapidly. However, when applied to database technologies, specifically query generation for graph databases and Knowledge Graphs (KGs), LLMs still face significant challenges. While research on LLM-driven query generation for Structured Query Language (SQL) exists, similar systems for graph databases remain underdeveloped. This paper presents a comparative study addressing the challenge of generating Cypher queries a powerful language for interacting with graph databases using open-access LLMs. We rigorously evaluate several LLM agents (OpenAI ChatGPT 4o, Claude Sonnet 3.5, Google Gemini Pro 1.5, and a locally deployed Llama 3.1 8B) using a designed few-shot learning prompt and Retrieval Augmented Generation (RAG) backed by Chain-of-Thoughts (CoT) reasoning. Our empirical analysis of query generation accuracy reveals that Claude Sonnet 3.5 outperforms its counterparts in this specific domain. Further, we highlight promising future research directions to address the identified limitations and advance LLM-driven query generation for graph databases.


To Draw Is Human: Toward No-Code Subgraph Search

Communications of the ACM

Due to the worldwide shortage of developers, growing talent gap, and budgetary challenges faced by small- and medium-sized businesses in hiring software teams, low-code or no-code frameworks are the latest disruption in the business world.1 For example, SAP recently launched SAP AppGyver, which is a "no-code application development platform that enables developers of all skill levels to create enterprise-ready applications with drag-and-drop simplicity."5 The demand for such low-code or no-code frameworks is not limited to software applications development but also for easy access and search of data residing in databases. Specifically, lay users should be able to access them without needing to write a single line of code. However, query languages (QL)--the primary means to access data residing in databases--enforce end users to be proficient in these languages before they can take advantage of databases for their tasks.


Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?

arXiv.org Artificial Intelligence

Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the key strategy to answer research questions in the medical field. To create a high-quality systematic review, complex Boolean queries are often constructed to retrieve studies for the review topic. However, it often takes a long time for systematic review researchers to construct a high quality systematic review Boolean query, and often the resulting queries are far from effective. Poor queries may lead to biased or invalid reviews, because they missed to retrieve key evidence, or to extensive increase in review costs, because they retrieved too many irrelevant studies. Recent advances in Transformer-based generative models have shown great potential to effectively follow instructions from users and generate answers based on the instructions being made. In this paper, we investigate the effectiveness of the latest of such models, ChatGPT, in generating effective Boolean queries for systematic review literature search. Through a number of extensive experiments on standard test collections for the task, we find that ChatGPT is capable of generating queries that lead to high search precision, although trading-off this for recall. Overall, our study demonstrates the potential of ChatGPT in generating effective Boolean queries for systematic review literature search. The ability of ChatGPT to follow complex instructions and generate queries with high precision makes it a valuable tool for researchers conducting systematic reviews, particularly for rapid reviews where time is a constraint and often trading-off higher precision for lower recall is acceptable.